Ensemble Dependence Model Based Cancer Classification using Gene Microarray Data

نویسنده

  • Peng Qiu
چکیده

DNA microarray technologies make it possible to simultaneously monitor thousands of genes expression levels. A topic of great interest is to study the different expression profiles between microarray samples from cancer patients and normal subjects, by classifying them at gene expression level. Currently, various clustering methods have been proposed in the literature to classify cancer and normal samples based on microarray data, and they are dominantly data-driven approaches. In this paper, we propose an alternative approach, a model-driven approach. We propose an ensemble dependence model, aiming at exploring the group dependence relationship of gene clusters. Under the framework of hypothesis-testing, we employ genes’ dependence relationship as a feature to model and classify cancer and normal samples. The proposed classification scheme is applied to five cancer data sets, and it is noted that the proposed method yields very promising performance. We further analyze the eigen domain of the proposed method, and discovered different patterns between cancer and normal samples. I. BACKGROUND AND PROPOSED SCHEME Current methods for the classification of microarray gene expression data can be mainly divided into two categories. One is based on clustering, which can be used to distinguish cancer and normal samples, and subtypes of cancers. Example schemes include Hierarchical clustering, Local Maximum clustering, Self-Organizing Map, and K-means clustering. These clustering methods are mainly data-driven approaches. Usually, they do not require much prior assumption, i.e., the underlying model. However, determining the number of clusters is a challenging problem itself, and there lacks of widelyaccepted measures to evaluate the clustering performance. The other category is mainly based on machine-learning approach. Motivated by the success of machine learning algorithms in image and speech processing, many researches have been reported to apply them to microarray data analysis. For example, support vector machine and neural network analysis. Machine learning methods generally yield better results than that of the traditional clustering methods. In this paper, we propose an ensemble dependence model based classification approach, as illustrated in Fig 1(a). It includes four main components, feature selection, gene clustering, ensemble dependence model and hypothesis testing. Due to the limited size of current data, it is not feasible to examine the regulation relationship between all genes. Also, the microarray gene expression data is noisy. However, if genes are clustered in a right way, the noise level in the resulting cluster expression will be reduced, and we will be able to reveal the ensemble dynamics of gene clusters. (a) (b) Fig. 1. (a) classification procedure; (b) ensemble dependence model Since not all genes’ expression profiles are informative in understanding the difference between cancer and normal samples, feature selection is needed to exclude irrelevant genes. As mentioned above, gene clustering is performed to group together genes with similar expression. To average out experiment noise and enhance genes’ common expression within each cluster, average gene expression profile is used to represent each cluster. Without any prior knowledge, we assume that, each cluster is to some extent dependent on all the other clusters, as shown in Fig 1(b). Linear dependence relationship is studied, in Equation (1), where aij represents an inter-cluster dependence relationship. The so-called selfregulation is assumed to be zero, i.e. aii = 0, i = 1, 2, 3, 4. Because cluster average is used to represent each cluster, intracluster dependence relationship is averaged out.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy

 In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Ensemble dependence model for classification and prediction of cancer and normal gene expression data

MOTIVATION DNA microarray technologies make it possible to simultaneously monitor thousands of genes' expression levels. A topic of great interest is to study the different expression profiles between microarray samples from cancer patients and normal subjects, by classifying them at gene expression levels. Currently, various clustering methods have been proposed in the literature to classify c...

متن کامل

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

An Ensemble Classification Model for the Diagnosis of Breast Cancer Using Stacked Generalization

Introduction: Breast cancer is one of the most common types of cancer whose incidence has increased dramatically in recent years. In order to diagnose this disease, many parameters must be taken into consideration and mistakes are possible due to human errors or environmental factors. For this reason, in recent decades, Artificial Intelligence has been used by medical practitioners to diagnose ...

متن کامل

An Ensemble Classification Model for the Diagnosis of Breast Cancer Using Stacked Generalization

Introduction: Breast cancer is one of the most common types of cancer whose incidence has increased dramatically in recent years. In order to diagnose this disease, many parameters must be taken into consideration and mistakes are possible due to human errors or environmental factors. For this reason, in recent decades, Artificial Intelligence has been used by medical practitioners to diagnose ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005